Exploiting Linguistic Cues to Classify Rhetorical Relations
نویسندگان
چکیده
We propose a method for automatically identifying rhetorical relations. We use supervised machine learning but exploit cue phrases to automatically extract and label training data. Our models draw on a variety of linguistic cues to distinguish between the relations. We show that these feature-rich models outperform the previously suggested bigram models by more than 20%, at least for small training sets. Our approach is therefore better suited to deal with relations for which it is difficult to automatically label a lot of training data because they are rarely signalled by unambiguous cue phrases (e.g., continuation).
منابع مشابه
Using automatically labelled examples to classify rhetorical relations: an assessment
Being able to identify which rhetorical relations (e.g., contrast or explanation) hold between spans of text is important for many natural language processing applications. Using machine learning to obtain a classifier which can distinguish between different relations typically depends on the availability of manually labelled training data, which is very time-consuming to create. However, rheto...
متن کاملIdentifying The Linguistic Correlates Of Rhetorical Relations
RASTA (Rhetorical Structure Theory Analyzer), a system for automatic discourse analysis, reliably, identifies rhetorical relations present m written discourse by examining information available in syntactic and logical form analyses. Since there is a many-to-many relationship between rhetorical relations and elements of linguistic form, RASTA identifies relations by the convergence of a number ...
متن کاملIdentification and Disambiguation of Lexical Cues of Rhetorical Relations across Different Text Genres
Lexical cues are linguistic expressions that can signal the presence of a rhetorical relation. However, such cues can be ambiguous as they may signal more than one relation or may not always function as a relation indicator. In this study, we first conduct a corpus-based analysis to derive a set of n-grams as potential lexical cues. These cues are then utilized in graph-based probabilistic mode...
متن کاملOWL ontologies as a resource for discourse parsing
In the project SemDok (Generic document structures in linearly organised texts) funded by the German Research Foundation DFG, a discourse parser for a complex type (scientific articles by example), is being developed. Discourse parsing (henceforth DP) according to the Rhetorical Structure Theory (RST) (Mann and Taboada, 2005; Marcu, 2000) deals with automatically assigning a text a tree structu...
متن کاملUsing Hedges to Classify Citations in Scientific Articles
Citations in scientific writing fulfil an important role in creating relationships among mutually relevant articles within a research field. These inter-article relationships reinforce the argumentation structure intrinsic to all scientific writing. Therefore, determining the nature of the exact relationship between a citing and cited paper requires an understanding of the rhetorical relations ...
متن کامل